Effective Skyline Cardinality Estimation on Data Streams

نویسندگان

  • Yang Lu
  • Jiakui Zhao
  • Lijun Chen
  • Bin Cui
  • Dongqing Yang
چکیده

In order to incorporate the skyline operator into the data stream engine, we need to address the problem of skyline cardinality estimation, which is very important for extending the query optimizer’s cost model to accommodate skyline queries. In this paper, we propose robust approaches for estimating the skyline cardinality over sliding windows in the stream environment. We first design an approach to estimate the skyline cardinality over uniformly distributed data, and then extend the approach to support arbitrarily distributed data. Our approaches allow arbitrary data distribution, hence can be applied to extend the optimizer’s cost model. To estimate the skyline cardinality in online manner, the live elements in the sliding window are sketched using Spectral Bloom Filters which can efficiently and effectively capture the information which is essential for estimating the skyline cardinality over sliding windows. Extensive experimental study demonstrates that our approaches significantly outperform previous approaches.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Estimating the Maximum Domination Value and the Skyline Cardinality of Multi-Dimensional Data Sets

The last years there is an increasing interest for query processing techniques that take into consideration the dominance relationship between items to select the most promising ones, based on user preferences. Skyline and top-k dominating queries are examples of such techniques. A skyline query computes the items that are not dominated, whereas a top-k dominating query returns the k items with...

متن کامل

Skyline Operator on Anti-correlated Distributions

Finding the skyline in a multi-dimensional space is relevant to a wide range of applications. The skyline operator over a set of d-dimensional points selects the points that are not dominated by any other point on all dimensions. Therefore, it provides a minimal set of candidates for the users to make their personal trade-off among all optimal solutions. The existing algorithms establish both t...

متن کامل

An Effective Probabilistic Skyline Query Process on Uncertain Data Streams

With the evolution of technology, the ways to acquire data and the applications of data are more diverse. As data volume continuously grows, the data quality may not be high as usual. The data can be defected, imprecise or inaccurate due to the process of data acquiring. Recently, the skyline query is widely used in data analysis to derive the results that meets more than one specific condition...

متن کامل

An Algorithm for Retrieving Skyline Points based on User Specified Constraints using the Skyline Ordering

Given a multidimensional data set, a skyline query returns the interesting points that are not dominated by other points. The actual cardinality (s) of a skyline query result may vary substantially from the desired result cardinality (k). An approach called skyline ordering is used that forms a skyline based partitioning of a given data set, it provides an ordering among the partitions. The con...

متن کامل

Skyline Ordering: A Flexible Framework for Efficient Resolution of Size Constraints on Skyline Queries

Given a set of multi-dimensional points, a skyline query returns the interesting points that are not dominated by other points. It has been observed that the actual cardinality (s) of a skyline query result may differ substantially from the desired result cardinality (k), which has prompted studies on how to reduce s for the case where k < s. This paper goes further by addressing the general ca...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008